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https://www.meetup.com/futureofdata-newyork/ AN OPEN SOURCE COMMUNITY 
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Cloud to Analytics to Cloud Storage to Fast Data to 
Machine Learning to Microservices to ... 
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2 My Talk List 


Utilizing Real-Time Transit Data for Travel Optimization 


Let's Monitor the Conditions at the Conference 
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z Agenda 


Apache NiFi has a lot of new features, processors and best practices that have arrived 
in the last year or so. 


| will walk through building flows using the latest tips, techniques and processor. 
| will and change a number of data flows utilizing the latest NiFi version and point out 
gotchas and some never dos. The deck will act as a take-away with notes, tips and 


guides to what we covered. 


===> Any NiFi 1.23+ and 2.0 in progress features people want to see? 
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New ExcelRecord Reader 


AmazonGlueSchemaRegistry 


https://issues.apache.org/jira/secure/ReleaseNote.jspa? projectld=12316020&version=12353320 
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© New to 2023 Processors 
GenerateRecord ListGoogleDrive 
FetchGoogleDrive 
GetAsanaObject PutGoogleDrive 
PutBoxFile 
PutSalesforceObject ListBoxFile 
QuerySalesforceObject FetchBoxFile 
PutDropbox 
PutloT DBRecord DecryptContent 
QueryloTDBRecord DecryptContentCompatibility 


https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectld-12316020&version- 12353320 
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9 New to 2023 Processors 


ExtractRecordSchema 
RemoveRecordField 
VerifyContentMAC 
TriggerHiveMetaStoreEvent 


"count" function added to RecordPath 
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AWS ML Service Processors 


AWS Transeribe AWS Textract 
APACHE M ₪ 
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AWS 5 
AWS Translate AWS Polly 
APACHE M m 6e kafka APACHE MM — 
n 1 | Er i | | \ cru 
ANS SS AWS 5 


httos://github.com/tspannhw/FLaNK-AWSML 
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AWS Translate 


i ۷ 
m Set Values 
UpdateAttribute 1.23.1.2.1.6.0-323 


org.apache.nifi - nifi-update-attribute-nar 


In 1 (70 bytes) 5 min 
Read/Write 0 bytes / 0 bytes 5 min 
Out 1 (70 bytes) 5 min 
Tasks/Time 1 / 00:00:00.011 5min 


, Name success | 
| Queued 1 (70 bytes) | 


tmm 


=]  StartAwsTranslateJob 
StartAwsTranslateJob 1.23.1.2.1.6.0-323 
یہ کہ‎ 3 | org.apache.nifi - nifi-aws-nar | 
| Name failure 


A ErrorsTranslate دا ؟ نع‎ um 0 (0 bytes) 5 min 
| [o | Oueued 0 (0 bytes) Read/Write 0 bytes / 0 bytes 5 min 
Out 0 (0 bytes) 5 min 
Tasks/Time 0 / 00:00:00.000 5 min 
Name failure Ň nem 
ame failure | 
d 0 (0 b: — 
سس ری کر تب‎ Queued 0 (0 bytes) | Name success | 


Queued 0 (0 bytes) 


ControlRate 
ControlRate 1.23.1.2.1.6.0-323 A] m GetAwsTranslateJobStatus 
org.apache.nifi - nifi-standard-nar GetAwsTranslateJobStatus 1.23.1.2.1.6.0-3... 


0 (0 bytes) 5 min Name running, throttled org.apache.nifi - nifi-aws-nar 


| Read/Write 0 bytes / 0 bytes 5 min Queued 0 (0 bytes) In 0 (0 bytes) 5 min 
0 (0 bytes) 5 min | س‎ Read/Write 0 bytes / 0 bytes 5 min 
Tasks/Time 0/ 0 5 min Out 0 (0 bytes) 5 min 


Tasks/Time 0 / 00:00:00.000 5 min 


Name success 
Queued 0 (0 bytes) 
Name success = ₪ v. 
0۴۵000 bie) m Rebuild Record 

EvaluateJsonPath 1.23.1.2.1.6.0-323 | 
org.apache.nifi - nifi-standard-nar | 


In 0 (0 bytes) 5 min 
Read/Write 0 bytes / 0 bytes 5 min 
Out 0 (0 bytes) 5 min 
. Tasks/Time 0 /00:00:00.000 5 min 
انتا ۷ ا کا یت کا با ا وو س ا سک‎ 


| Name matched ] 
| Queued 0 (0 bytes) 
سس تسس‎ 
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Deprecating for Removal 


© 


Deprecate Lua and Ruby Script Engines 

Deprecate ECMAScript Script Engine 

Deprecate the Ambari Reporting Task 

Deprecate Kafka 1.x components and 2.0 components 
XML Templates 

Variables 


See: 


https://cwiki.apache.org/confluence/display/NIFI/Deprecated+Componentst+and+Features 
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ExecuteStateless -> run your stateless flows right in a regular NiFi cluster 


Parameters 
JSON Flow Serialization 


Records everywhere 
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NiFi 2.0 Coming 


Python Integration 

Parameters 

JDK 17, maybe JDK 21+ 

JSON Flow Serialization 

Rules Engine for Development Assistance 
Run Process Group as Stateless 
flow.json.gz 


https://cwiki.apache.org/confluence/display/NIFI/NiFi+2.0+Release+Goals 
https://medium.com/cloudera-inc/getting-ready-for-apache-nifi-2-0-5a5e6a6 [1450 
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Thanks to Pierre! 


Q2 | 
Q Pierre Villard 


Apache NiFi Committer & PMC member | 
Working @Cloudera - ex-@Google | Twitter & 
Github — @pvillard31 | Blog © 


www.pierrevillard.com 
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Python as First Class (NIFI-11241) 


import cv2 

import numpy as np 

import json U ₪ GetFile 

from nifiapi.properties import PropertyDescriptor File 1 0-SNAPSHOT 
from nifiapi.properties import ResourceDefinition org.apache.nifi - nifi-standard-nar 
from nifiapi.flowfiletransform import FlowFileTransformResult 


In 0 (0 bytes) 5 min 
SCALE_FACTOR = 0.00392 Read/Write 0 bytes / 0 bytes 
NMS_THRESHOLD = 0.4 # non-maximum suppression threshold 
CONFIDENCE_THRESHOLD = 0.5 Out 0 (0 bytes) 
Tasks/Time 0 / 0 
class DetectObjectInImage: 
class Java: | 
implements - ['org.apache.nifi.python.processor.FlowFileTransform'] Name success 
class ProcessorDetails: 
version = '0.0.1-SNAPSHOT' Queued 0 (0 bytes) 
dependencies = ['numpy >= 1.23.5', 'opencv-python >= 4.6'] Y 
101% (self, jvm=None, **kwargs): DetectObjectlnimage 
self.jvm = jvm DetectObjectlnimage 0.0.1-SNAPSHOT 


org.apache.nifi - python-extensions 

# Build Property Descriptors TESNE Name success 

self.model file = PropertyDescriptor( In 0 (0 bytes) SMIN ے‎ d 2(772b 
name = "Model File', Read/Write 0 bytes / 0 bytes 5 min Queue ( ytes) 
description = 'The binary file containing the trained Deep Neural Network weights. Supports Caffe (x.caffemodel), TensorFlow (*.pb), Torch (*.t7, *.net), Darknet (*.weights), ' 


'DLDT (*.bin), and ONNX (*.onnx)', Out 0 (0 bytes) 5 min 


required = True, Tasks/Time 0 / 00:00:00.000 5 min 
resource definition - ResourceDefinition(allow file - True) کے‎ : » = - 


) 
self.config_file = PropertyDescriptor( 
name = 'Network Config File', 
description = "The text file containing the Network configuration. Supports Caffe (*.prototxt), TensorFlow (*.pbtxt), Darknet (*.cfg), and DLDT (x.xml)', 
required = False, 
resource_definition = ResourceDefinition(allow_file = True) 
) 
self.class_name_file = PropertyDescriptor( 
name = 'Class Names File', 
description = 'A text file containing the names of the classes that may be detected by the model. Expected format is one class name per line, new-line terminated."', 
required = True, 
resource_definition = ResourceDefinition(allow_file = True) 
) 
self.descriptors = [self.model_file, self.config_file, self.class_name_file] 


getPropertyDescriptors(self): 
return self.descriptors 


onScheduled(self, context): 

# read class names from text file 

class name file = context.getProperty(self.class name file.name).getValue() 
if class name file is None: 
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Apache NiFi in a few numbers 


A very active project with a dynamic community & comparison with ACEU 2019 


2800+ members on the Slack channel (535+ - 4 years ago) 


475+ contributors on Github across the repositories (260+ - 4 years 
ago) 
65 committers in the Apache NiFi community (45 - 4 years ago) 


Apache NiFi 1.23.2 is the latest release, NiFi 2.0 coming soon (NiFi 
1.10 - 4 years ago) 


14M+ docker pulls of the Apache NiFi image (1M+ - 4 years ago) 
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MiNiFi C++ 


(small footprint) 


MiNiFi J m: : 
(headless version of NF) NiFi Deploy Options from Open Source to Managed 


NiFi Registry 


Stateless NiFi 


Cloudera Edge Flow Manager 
(Command & Control of MiNiFi Agents) 


Cloudera NiFi for Kafka 
Connect 


NiFi in 
Cloudera DataFlow Functions 


Cloudera DataFlow 
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NiFi 2.0 is coming... https://medium.com/cloudera-inc/getting-ready-for-apache-nifi-2-0-5a5e6a67f450 


- First-class citizen Python API 


B Rules Engine Evernote AI Chatbot Powered by 
- . NiFi Stateless at Process Group level Apache NiFi using OpenAI, Pinecone & 
- Java 21 (virtual threads, perf improvements, etc) Langchain 


When did my right elbow pain occur? eo 
Python © 4 9 en et 
/ € The right elbow pain occurred on January 11, 2023. 
Ba, Evernote SDK Ae LangChain OpenAI “v Pinecone e "m 1 


Evernote Source: Note: Right Elbow Pain - 01-11-23 (Notebook: Family Aju Health), 


What doctor did i see for it? eo 


Chunk 1 
(a You saw Dr. Patel for your right elbow pain. 
Evernote Source: Note: Appointment with Dr. Patel on 6. 20 for Right Elbow Pain (Notebook: Family. Aju 
Chunk 2 وس‎ 4 Health), 
E t What was the cause for the right elbow pain and what did the doctor 
verno e suggest? 
L , Chunk 3 اھ‎ ۳۳ The cause for the right elbow pain is tennis elbow. The doctor 
Data Lake " ] suggested a cortisone shot in the right elbow and taking Diclofenac 
for prescription pain relief. 
Upsert Vector 
Export all Notes from oe aie Evernote Source: Note: Appointment with Dr. Patel on 6_20 for Right Elbow Pain (Notebook: Family. Aju 
Health), , Note: Right Elbow Pain - 01-11-23 (Notebook: Family, Aj ith), 
Evernote Chunk X $ ector I atabase like ealth), , Note: Right Elbow Pain - 01-11-23 (Notebook: Family Aju_ Healt! 
Pinecone 
What was the dosage for Diclofenac? eo 
Each note chunked Create Vector Embeddings 
into optimized size for for each chunk using D^] The dosage for Diclofenac is 50 mg, 3 times a week. 
Embedding Model services like OpenAI 


Evernote Source: Note: Medications & Prescriptions That | Take (Notebook: Family Aju Health), 


Closing the gap between data engineers and data scientists... 
- . Export documentation (Sharepoint, OCR) to build the knowledge base powering your chatbot 
- Scrape the internet (Sitemap) to build the knowledge base powering your chatbot 
- Real-time streaming ingest of Slack to build the knowledge base powering your chatbot ee 
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DEMO 


=] A QueryRecord 
- QueryRecord 7 3 
org.apache.nifi - nifi-standard-nar 


In 0 (0 bytes) 
Read/Write Obytes/O bytes — 
Out 0 (0 bytes) 


Tasks/Time 0 / 00:00:00.000 


5 min 
5 min 
5 min 
5 min 


| A GetAsanaObject 
- GetAsanaObject 1.23.1.2.1.6.0-323 
org.apache.nifi - nif-asana-processors-nar 


In 0 (0 bytes) 5 min 
| Read/Write 0 bytes / 0 bytes __Smin 
Out 0 (0 bytes) 5 min 
| Tasks/Time 0 / 00:00:00.000 5 min 


PutSalesforceObject 1.23.1.2.1.6.0-323 
org.apache.nifi - nifi-salesforce-nar 


A PutSalesforceObject 
- 


bytes) 5 min‏ 0(0 .. ھا ےجا 
Read/Write 0 bytes / 0 bytes 5 min‏ 
Out 0 (0 bytes) 5 min‏ 
Tasks/Time 0 / 00:00:00.000 5 min‏ 
A PutloTDBRecord‏ 
PutloTDBRecord 1.23.1.2.1.6.0-323‏ 
org. apache.nifi - nifi-totdb-nar‏ 
m 0 (0 bytes) 5 min‏ 
Read/Write 0 bytes / 0 bytes 5 min‏ 
Ou — 0069 _ 5 min‏ 
Tasks/Time 0 / 00:00:00.000 5 min‏ 
à, VerifyContentMAC‏ 
VerifyContentMAC 1,23.1.2.1.6.0-323‏ - 
org.apache.nifi - nifi-cipher-nar‏ 
In 0 (0 bytes) 5 min‏ 
Read/Write 0 bytes / 0 bytes 5 min‏ 
Out 0 (0 bytes). 5 min‏ 
Tasks/Time 0 / 00:00:00.000 5min‏ 


` [7] A QuerySalesforceObject 
QuerySalesforceObject 1.23.1.2.1.6.0-323 
| org.apache.nifi - nifi-salesforce-nar 


| In 0 (0 bytes) 5 min 
. Read/Write 0 bytes / 0 bytes 5min 
| Out 0 (0 bytes) 5min 
| Tasks/Time 0 / 00:00:00.000 5min 


A QueryloTDBRecord 
QueryloTDBRecord 1.23.1.2.1.6.0-323 
org.apache.nif: - nifHotdb-nar 


In 0 (0 bytes) 5 min 
` Read/Write 0 bytes / 0 bytes 5 min 
Out 0 (0 bytes) 5 min 
Tasks/Time 0 / 00:00:00.000 5 min 


=]  RemoveRecordField 
RemoveRecordField 1.23.1.2.1.6.0-323 
org.apache.nifi - nifi-standard-nar 


In 0 (0 bytes) 5min 
Read/Write 0 bytes / 0 bytes 5min 
Out 0 (0 bytes). 5 min 
Tasks/Time 0 / 00:00:00.000 5 min 


ListGoogleDrive 1.23.1.2.1.6.0-323 


A ListGoogleDrive 
- 
org. apache. nifi - nifi-gep-nar 


In 0 (0 bytes) Smin 

Read/Write 0 bytes / 0 bytes 5min | 
Out 0 (0 bytes) 5min | 
Tasks/Time 0 0 Smin | 


سب 


Name success | 
Queued 0 (0 bytes) | 


س 


7] A FetchGoogleDrive 
~~ FetchGoogleDrive 1.23.1.2.1.6.0-323 
org. apache nifi - nifi-gop-nar 


In 0 (0 bytes) 5 min 
Read/Write 0 bytes / 0 bytes 5 min 
Out 0 (0 bytes) 5 min 
| Tasks/Time 0 / 00:00:00.000 5 min 


. Name success | 
Queued 0 (0 bytes) | 
سس دح‎ 


^) A PutGoogleDrive 
- PutGoogleDrive 1.23.1.2.1.6.0-323 
org.apache.nifi - nifi-gcp-nar 


In | 0 (O bytes) Smin | 
Read/Write 0 bytes / 0 bytes 5 min 
Out 0 (0 bytes) 5min | 


Tasks/Time 0 / 00:00:00.000 


=] À GenerateRecord 
₪ GenerateRecord 1.23.1.2.1.6.0-323 
org apache nifi - nifi-standard-nar 


In — 0(Obytes) — Smin 
Read/Write 0 bytes / 0 bytes 5 min 
Out 0 (0 bytes) 5 min 


Tasks/Time 0 / 00:00:00.000 


5] A ListBoxFile ^l A DecryptContent 
ListBoxFile 1.23.1.2.1.6.0-323 | DecryptContent 1.23.1.2.1.6.0-323 
org.apache nifi - nifi-box-nar | org apache nifi - nifi-cipher-nar 
In 0 (0 bytes) 5min E: 0 (0 bytes) - 5 min 
Read/Write 0 bytes / 0 bytes 5 min Read/Write 0 bytes / 0 bytes Smin | 
Out 0 (0 bytes) 5 min . Out 0 (0 bytes) 5 min 
Tasks/Time 0 / 00:00:00.000 5 min . Tasks/Time 0 / 00:00:00.000 5 min 


| Queued 0 (0 bytes) 


So 


Ime | 


A FetchBoxFile A ListDropbox 


i - ہم‎ 
ںا‎ rl go - ListDropbox 1.23.1.2.1.6.0-323 


org.apache.nifi - nifi-dropbox-processors-nar 
MM RUM) pmi In 0 (0 bytes) 5min 
با الب شید‎ LL ee Las Read/Write 0 bytes / 0 bytes 5 min 
Out \ 0 (0 bytes) 5 min Out 0 (0 bytes) 5 min 
Tasks/Time 0 / 00:00:00.000 5 min | Tasks/Time 0 / 00:00:00.000 5min 


Name success 
| Queued 0 (0 bytes) 


— fe سے کے‎ a 
Name success | 
Queued 0 (0 bytes) 


— RE 2 
A PutBoxFile | L سا‎ v 
- PutBoxFile 1.23.1.2.1.6.0-323 =< FetchDropbox 
org apache.nifi - nif-box-nar | - FetchDropbox 1.23.1.2.1.6.0-323 

In 0 (0 bytes) 5min org.apache nifi - nifi-dropbox-processors-nar 
Read/Write 0 bytes / 0 bytes 5min In ___ O(0bytes) 5 min 
Out 0 (0 bytes) 5min Read/Write 0 bytes / 0 bytes 5 min 
"T `0 / 00:00:00 5min | Out 0 (0 bytes) 5 min 
Tasks/Time 0 00:00:00.000 5 min 


Name success | 
Queued 0 (0 bytes) | 
سس‎ 


| fA PutDropbox 
| 4 PutDropbox 1.23.1.2.1.6.0-323 
| 


ExecuteStateless 1.23.1.2.1.6.0-323 
org. apache nifi - nifi-stateless-processor-nar 


A ExecuteStateless‏ כו 
A)‏ 


In 0 (0 bytes) org.apache nifi - nifi-dropbox-processors-nar 
Read/Write 0 bytes / 0 bytes .h 0 (0 bytes) 5 min 
Out 0 (0 bytes) | Read/Write 0 bytes / 0 bytes 5 min 
Tasks/Time 0/۱ 0 | Out 0 (0 bytes) 5 min 
| Tasks/Time 0 / 00:00:00.000 5 min 


دید DII‏ 
تچ ہے 
À‏ 
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